Improving the Template Generation for Chinese Character Error Detection with Confusion Sets

نویسندگان

  • Yong-Zhi Chen
  • Shih-Hung Wu
  • Ping-Che Yang
  • Tsun Ku
چکیده

In this paper, we propose a system that automatically generates templates for detecting Chinese character errors. We first collect the confusion sets for each high-frequency Chinese character. Error types include pronunciation-related errors and radical-related errors. With the help of the confusion sets, our system generates possible error patterns in context, which will be used as detection templates. Combined with a word segmentation module, our system generates more accurate templates. The experimental results show the precision of performance approaches 95%. Such a system should not only help teachers grade and check student essays, but also effectively help students learn how to write.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reducing the False Alarm Rate of Chinese Character Error Detection and Correction

The main drawback of previous Chinese character error detection systems is the high false alarm rate. To solve this problem, we propose a system that combines a statistic method and template matching to detect Chinese character errors. Error types include pronunciationrelated errors and form-related errors. Possible errors of a character can be collected to form a confusion set. Our system auto...

متن کامل

NTOU Chinese Spelling Check System in CLP Bake-off 2014

This paper describes details of NTOU Chinese spelling check system participating in CLP2014 Bakeoff. Confusion sets were expanded by using two language resources, Shuowen and Four-Corner codes. A new method to find spelling errors in legal multi-character words was proposed. Comparison of sentence generation probabilities is the main information for error detection and correction. A rulebased c...

متن کامل

中文混淆字集應用於別字偵錯模板自動產生 (Chinese Confusion Word Set for Automatic Generation of Spelling Error Detecting Template) [In Chinese]

In this research, we proposed a system that can use automatically generated templates for detecting Chinese spelling error. At first, we use frequently used Chinese characters to produce the Chinese confusion set. Based on a dictionary, our system automatically generated negative vocabulary template with the help of Chinese confusion set. Error types include pronunciation-related errors and rad...

متن کامل

Contextual post-processing based on the confusion matrix in offline handwritten Chinese script recognition

The inclusion of potentially correct characters in candidate sets is key to improving accuracy in the recognition of Chinese scripts in the aspect of contextual post-processing. This paper presents two methods based on a confusion matrix to recall the correct characters. The first method uses original candidates to conjecture the most likely correct characters, and then combines the conjectured...

متن کامل

Phonetic confusion analysis and robust phone set generation for Shanghai-accented Mandarin speech recognition

In this paper, accent issues are discussed for Shanghai-accented Mandarin speech recognition. The phonetic confusion is analyzed in detail based on the alignment between the surface form and the baseform transcriptions. Mutual information is used as the measure to extract the most confusing phoneme pairs. It was found that each phoneme in one pair can be easily misrecognized with the other. To ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJCLCLP

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2010